
HL Paper 3
This question is about modelling the spread of a computer virus to predict the number of computers in a city which will be infected by the virus.
A systems analyst defines the following variables in a model:
- is the number of days since the first computer was infected by the virus.
- is the total number of computers that have been infected up to and including day .
The following data were collected:
A model for the early stage of the spread of the computer virus suggests that
where is the total number of computers in a city and is a measure of how easily the virus is spreading between computers. Both and are assumed to be constant.
The data above are taken from city X which is estimated to have million computers.
The analyst looks at data for another city, Y. These data indicate a value of .
An estimate for , can be found by using the formula:
.
The following table shows estimates of for city X at different values of .
An improved model for , which is valid for large values of , is the logistic differential equation
where and are constants.
Based on this differential equation, the graph of against is predicted to be a straight line.
Find the equation of the regression line of on .
Write down the value of , Pearson’s product-moment correlation coefficient.
Explain why it would not be appropriate to conduct a hypothesis test on the value of found in (a)(ii).
Find the general solution of the differential equation .
Using the data in the table write down the equation for an appropriate non-linear regression model.
Write down the value of for this model.
Hence comment on the suitability of the model from (b)(ii) in comparison with the linear model found in part (a).
By considering large values of write down one criticism of the model found in (b)(ii).
Use your answer from part (b)(ii) to estimate the time taken for the number of infected computers to double.
Find in which city, X or Y, the computer virus is spreading more easily. Justify your answer using your results from part (b).
Determine the value of and of . Give your answers correct to one decimal place.
Use linear regression to estimate the value of and of .
The solution to the differential equation is given by
where is a constant.
Using your answer to part (f)(i), estimate the percentage of computers in city X that are expected to have been infected by the virus over a long period of time.
In this question you will explore possible models for the spread of an infectious disease
An infectious disease has begun spreading in a country. The National Disease Control Centre (NDCC) has compiled the following data after receiving alerts from hospitals.
A graph of against is shown below.
The NDCC want to find a model to predict the total number of people infected, so they can plan for medicine and hospital facilities. After looking at the data, they think an exponential function in the form could be used as a model.
Use your answer to part (a) to predict
The NDCC want to verify the accuracy of these predictions. They decide to perform a goodness of fit test.
The predictions given by the model for the first five days are shown in the table.
In fact, the first day when the total number of people infected is greater than 1000 is day 14, when a total of 1015 people are infected.
Based on this new data, the NDCC decide to try a logistic model in the form .
Use the data from days 1–5, together with day 14, to find the value of
Use an exponential regression to find the value of and of , correct to 4 decimal places.
the number of new people infected on day 6.
the day when the total number of people infected will be greater than 1000.
Use your answer to part (a) to show that the model predicts 16.7 people will be infected on the first day.
Explain why the number of degrees of freedom is 2.
Perform a goodness of fit test at the 5% significance level. You should clearly state your hypotheses, the p-value, and your conclusion.
Give two reasons why the prediction in part (b)(ii) might be lower than 14.
.
.
.
Hence predict the total number of people infected by this disease after several months.
Use the logistic model to find the day when the rate of increase of people infected is greatest.
An estate manager is responsible for stocking a small lake with fish. He begins by introducing fish into the lake and monitors their population growth to determine the likely carrying capacity of the lake.
After one year an accurate assessment of the number of fish in the lake is taken and it is found to be .
Let be the number of fish years after the fish have been introduced to the lake.
Initially it is assumed that the rate of increase of will be constant.
When the estate manager again decides to estimate the number of fish in the lake. To do this he first catches fish and marks them, so they can be recognized if caught again. These fish are then released back into the lake. A few days later he catches another fish, releasing each fish after it has been checked, and finds of them are marked.
Let be the number of marked fish caught in the second sample, where is considered to be distributed as . Assume the number of fish in the lake is .
The estate manager decides that he needs bounds for the total number of fish in the lake.
The estate manager feels confident that the proportion of marked fish in the lake will be within standard deviations of the proportion of marked fish in the sample and decides these will form the upper and lower bounds of his estimate.
The estate manager now believes the population of fish will follow the logistic model where is the carrying capacity and .
The estate manager would like to know if the population of fish in the lake will eventually reach .
Use this model to predict the number of fish in the lake when .
Assuming the proportion of marked fish in the second sample is equal to the proportion of marked fish in the lake, show that the estate manager will estimate there are now fish in the lake.
Write down the value of and the value of .
State an assumption that is being made for to be considered as following a binomial distribution.
Show that an estimate for is .
Hence show that the variance of the proportion of marked fish in the sample, , is .
Taking the value for the variance given in (d) (ii) as a good approximation for the true variance, find the upper and lower bounds for the proportion of marked fish in the lake.
Hence find upper and lower bounds for the number of fish in the lake when .
Given this result, comment on the validity of the linear model used in part (a).
Assuming a carrying capacity of use the given values of and to calculate the parameters and .
Use these parameters to calculate the value of predicted by this model.
Comment on the likelihood of the fish population reaching .
Consider the functions , : defined by
and .
Find .
Find .
State with a reason whether or not and commute.
Find the inverse of .
A suitable site for the landing of a spacecraft on the planet Mars is identified at a point, . The shortest time from sunrise to sunset at point must be found.
Radians should be used throughout this question. All values given in the question should be treated as exact.
Mars completes a full orbit of the Sun in Martian days, which is one Martian year.
On day , where , the length of time, in hours, from the start of the Martian day until sunrise at point can be modelled by a function, , where
.
The graph of is shown for one Martian year.
Mars completes a full rotation on its axis in hours and minutes.
The time of sunrise on Mars depends on the angle, , at which it tilts towards the Sun. During a Martian year, varies from to radians.
The angle, , through which Mars rotates on its axis from the start of a Martian day to the moment of sunrise, at point , is given by , .
Use your answers to parts (b) and (c) to find
Let be the length of time, in hours, from the start of the Martian day until sunset at point on day . can be modelled by the function
.
The length of time between sunrise and sunset at point , , can be modelled by the function
.
Let and hence .
can be written in the form , where and are complex functions of .
Show that .
Find the angle through which Mars rotates on its axis each hour.
Show that the maximum value of , correct to three significant figures.
Find the minimum value of .
the maximum value of .
the minimum value of .
Hence show that , correct to two significant figures.
Find the value of .
Find the value of .
Write down and in exponential form, with a constant modulus.
Hence or otherwise find an equation for in the form , where .
Find, in hours, the shortest time from sunrise to sunset at point that is predicted by this model.
This question explores methods to determine the area bounded by an unknown curve.
The curve is shown in the graph, for .
The curve passes through the following points.
It is required to find the area bounded by the curve, the -axis, the -axis and the line .
One possible model for the curve is a cubic function.
A second possible model for the curve is an exponential function, , where .
Use the trapezoidal rule to find an estimate for the area.
With reference to the shape of the graph, explain whether your answer to part (a)(i) will be an over-estimate or an underestimate of the area.
Use all the coordinates in the table to find the equation of the least squares cubic regression curve.
Write down the coefficient of determination.
Write down an expression for the area enclosed by the cubic function, the -axis, the -axis and the line .
Find the value of this area.
Show that .
Hence explain how a straight line graph could be drawn using the coordinates in the table.
By finding the equation of a suitable regression line, show that and .
Hence find the area enclosed by the exponential function, the -axis, the -axis and the line .